
Speedy Site Prospecting Using Social Metrics & Natural Language Processing

 We can all agree that a large timesink in outreach link building is site prospecting. Who really wants to spend the time to go through a bunch of pages on a site to figure out if the site is worthwhile? Further, if you’re following the “Throw Away Your Form Letters” principles then you are looking for content on a blogger or webmaster’s site that is of interest in order to start a conversation – but that takes a lot of time too. It would be awesome to scale that process wouldn’t it? Now, there’s an app for that.

I had the idea that if I spidered a site and matched the URLs with social metrics and then used natural language processing to figure out the core concepts of every page I could tell at a glance whether a site is worth my time and what content (if any) is popular. Note: I have purposely left out Linkscape’s metrics from this as I don’t believe we should waste API calls on what may be many worthless pages. You should identify the worthwhile pages and head over to Open Site Explorer. Sound good? Ok, let’s do this!


Natural Language Processing Explained

Natural Language Processing is a machine learning technique in which an application algorithmically performs text analytics to extract core concepts and in effect determine what a page is about. This type of distillation is the proxy between the written document and programming to allow a computer to “understand” content. As you can guess this is something that Google strongly leverages as can be seen in the “Systems and Methods for Inferring Concepts for Association with Content” patent from 2004.


There are a variety of awesome APIs that do natural language processing but for this we will be using Textwise simply because it’s entirely free.


Textwise Screenshot of Rand's Post


Of course Rand is a pro so he titles his posts properly therefore the concepts “creative,” “typography” and “ux” come as no surprise here, but for less savvy writers and people who write more colorfully you may not be able to tell what a page is about from just the title. Also you get a better sense of what keywords, concepts or topics a computer will associate with a given page.


The next example is a page from QN5 Music (full disclosure: I do music with these guys and they are incredible) where the title “Thank You for An Incredible Evening” is somewhat vague.


QN5 Music Post Screenshot


The post is a recap of their 2011 Megashow but there’s no meta description so you may not be able to tell what the page is about when prospecting from an Excel sheet generated by Screaming Frog. Now let’s couple that with Textwise concepts:


QN5 Post Textwise Screenshot


At a glance you can guess that the page is about some sort of incredible music performance and there were puppets involved. You’d be wrong thinking that it was Rock music though, but that’s just the gift and the curse of ambiguity of words. In other words after 5 seconds you are about 90% correct as to what the page is about without ever looking at it.


Your New Best Friend SiteSkout


SiteSkout Screenshot


SiteSkout is a brand new tool I wrote in PHP that spiders a site, retrieves social metrics, scrapes the page title and meta description and pings Textwise for concepts and categories then shows you all that awesome information as it happens and then exports it to a CSV file for download and Excel ninjitsu. (*dusts off shoulder*)


Siteskout Post-Run


There are a few options that will affect the speed at which this all happens. You can have it spider a site from a given URL just like you would Screaming Frog or Xenu but be warned single-threaded spidering is slow. So I would suggest you use Screaming Frog for your spidering and just dump the URLs to SiteSkout or use the HTML or XML Sitemap because it will just crawl those URLs for scraping purposes instead of crawling through every link trying to determine the URL of every page on the site.


Bring Your Own API

So while my “Using Social Media to Get Ahead of Search Demand” post may have underperformed by my personal standards (only 49 thumbs up) I learned a valuable lesson – if you put a tool on the front page of SEOmoz you better account for a very high number of API calls.


GoFish Traffic


So for SiteSkout I’m encouraging users to bring their own API keys. The tool is built on 4 keys so it will run without it, but to ensure stability, signup for your own Textwise API key.

  • Step 1: Register – Textwise has a very painless registration process, all you need is a name and email address. 
  • Step 2: Find your API key – Your API key is hidden away in your profile, grab it and save it somewhere like a text file.
Find your Textwise API here!
  • Step 3: Plug it into SiteSkout. SiteSkout will cookie your Textwise API for you so you don’t have to enter it every time you use the tool. 



My motto is “all actionable everything” so let’s talk about how this data will help you do more effective link building.


Prospecting a Site

The obvious application is that it helps you prospect a site; if you mash this data up with a Screaming Frog export what you get is a macroscopic view of what the site is about at a glance and then a microscopic view of what a page is about without ever visiting the site. Use VLOOKUP on the URLs and bring all the data together. I’d suggest using the heading tags, level, inlinks, outlinks, external outlinks and hash columns from Screaming Frog in concert with this.


If you use a SiteSkout export in concert with the SEER OSE-Twitter link building methodology (I love that method so much) you can quickly figure out who follows you but doesn’t link to you and what existing page on a given site you should ask for a link from.


Outreach Material

In my eyes the real power is in that you now easily have something to talk to the webmaster/blogger about. You now at a glance can determine the most popular content on the site and the magic inherent in that is social proof works both ways. That is to say if something is popular it makes sense that you would contact the writer about it. Your link target will be disarmed to a certain degree because they are very likely to have received a lot of praise and correspondence via social media and email because of their popular content. In short, SiteSkout helps you take out the cold calling aspect of link building.


Like I always say, Context is King!


I’d love to hear your thoughts and success stories with the tool in the comments below!  There’s bound to be some bugs in this, please just hit me on twitter (@ipullrank) if anything goes wrong for you. I continue to update these tools with feedback from you until they are running super smoothly. Also this tool is NOT an SEOmoz tool and any errors or failures are my fault, not the wonderful team of developers at the Mozplex, so if you run into any problems ping me not them. 


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button